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Description 

[0001] The present invention relates to classification methods and systems, and, in particuiar, to methods and sys- 
tems for classifying opticajiy acquired character images and to methods and systems for training such. 

2. Statement of Related Art 

[0002]' In the document YOSHIKAZU MIYANAGA ET AL: 'PARALLEL AND ADAPTIVE CLUSTERING METHOD 
SUITABLE FOR A VLSI SYSTEM' SIGNAL IMAGE AND VIDEO PROCESSING. SINGAPORE, JUNE 11 -14, 1991, 

10 PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, NEW YORK, IEEE, US, 
vol. 1 SYMP. 24. 11 June 1991 (1991 -06-1.1), pages 356-359, XP000384785 ISBN; 0-7803-0050-5 a two-functional 
networi< is proposed in which adaptive methods are implemented for sophisticated recognition and clustering, in the 
first subnetwork, self-organized clustering is realized. The clustering is based on Mahaianobis distance. The result of 
the first subnetwork becomes a vector of similarity values between a given input pattern and all patterns of cluster 
: 15 nodes. The second subnetwork detennines the optimum label from the similarity network. The second networkoonsists 
of nodes associated with specific labels. All connections between the label nodes of the second functional network 
and the cluster nodes of the first functional network are determined by supervised learning. Every calculation Is exe- 
cuted in parallel and pipelined forms. Figure 3 shows the block diagram of the control flow. An input pattern is fed into 
all nodes of the first sub-network at each Instance. At the fist step the similarity values of all nodes are calculated. If a 

20 new node should be created, an input pattern is assigned to the new node, if it is not necessary to create a new node, 
the optimum node is selected from the similarily values. 

[0003] in the field of package shipping, packages are routed from origins to destinations throughout the world ac- 
cording to destination addresses typed on shipping labels applied to these packages, in order to route packages, it is 
desirable to use automated optical character classification systems that can read those addresses. Such a classification 

25 system must be able to classify characters as quickly as possible. Conventional optical character classification systems 
using spherical neurons, such as those disclosed in U.S. Patent No. 4,326,259 (Cooper et al.), may be unable to 
execute the processing requirements presented by certain applications without a substantial Investment in hardware. 
[0004] In the document KELLY P M ET AL: 'An adaptive algorithm for modifying hyperellipsoidal decision surfaces' 
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS. (IJCNN). BALTi- 

30 MORE, JUNE 7 - 11 , 1992, NEW YORK, IEEE, US, vol. 3, 7 June 1992 (1992-06-07), pages 196-201 , XP01 00601 09 
ISBN : 0-7803-0559-0 a learning algorithm for a distance oiassifier has been developed which manipulates hyperellip- 
soidal cluster boundaries. Regions of the input feature space are first enclosed by ellipsoidal boundaries, and then 
these boundaries are iterativeiy modified to reduce classification error in areas of known overlap between different 
classes. During adaption each hypereiiipsoid in the classifier will maintain its original orientation, although its position 

35 and shape will be modified. The algorithm that is used is referred as the LVQ-MM algorithm ( LVQ with the Mahaianobis 
distance Metric). 

SUMMARY OF THE INVENTION 

40 [0005] The present invention covers a training method and apparatus for creating a new neuron in a feature space 
having at least one existing neuron. The invention generates a feature vector representative of a training input, where 
- the training input con-esponds to one of a plurality of possible outputs, if no existing neuron corresponding to the training 
input encompasses the feature vector, then the invention creates a new neuron, where the new neuron comprises a 
boundary defined by two' or more neuron axes of different length. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0006] 

so Figs. 1 (a), 1 (b), 1 (c), and 1 (d) are bitmap representations of a nominal letter "O", a degraded letter "O", a nominal 

number "7", and a degraded letter "7", respectively; 

Fig. 2 is a graphical depiction of a 2-dimensionai feature space populated with 8 elliptical neurons that may be 
employed by the classification system of the present invention to classify images of the letters A, B, and C; 

55 

Fig. 3 is a process flow diagram for classifying inputs according to a prefen-ed embodiment of the present invention; 
Fig. 4 is a schematic diagram of part of the classification system of Fig. 3; 
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Fig. 5 is a process fiow diagram for generating neurons used by the classification system of Fig. 3; and 

Fig. 6 is a schematic diagram of a classification system that uses cluster classifiers for classifying inputs Recording 
■ to a prefen-ed embodiment of the present invention, 
5 " • . ■ ■ ■ • 

DETAILED DESCRIPTION OF THE INVENTION 

[0007] The present invention includes a system for optical character recognition, but, more generally, the invention 
covers a classification system for classifying an input as one of a defined set of possible outputs. For example, where 
10 the input is an optically acquired image representing one of the 26 capital letters of the English alphabet, the classifi- 
cation system of the present invention may be used to select as an output that capital letter that is associated with the 
input image. The classification system of the present Invention Is discussed below in connection with Figs. 1 (a), 2, 3, 
and 4. 

[0008] The present invention also includes a system for "training" the classification system of the present invention. 
15 This training system Is preferably operated off line prior to deployment of the classification system. In the character 
recognition example, the training system accepts input images representative of known characters to "learn" about the 
set of possible outputs into which unknown images will eventually be classified. The training system of the present 
invention is discussed below in connection with Fig. 5. 

[0009] The present invention also includes a system for training the classification system of the present invention 
20 based on ordering the training inputs according to the relative quality of the training inputs. This system for training is 
discussed below in connection with Figs. 1 (a), 1 (b), 1 (c), and 1 (d). 

[0010] The present invention also includes a system for adjusting the locations and shapes of neurons generated 
during the training systems of the present invention. 

[0011] The present invention also includes a classification system employing a hierarchical network of top-level and 
25 lower level cluster classifiers. The top-level classifier classifies inputs into one of a plurality of output clusters, where 
each output cluster is associated with a subset_of the set of possible outputs. A cluster classifier, associated with the 
output cluster identified by the top-level classifier, then classifies the input as corresponding to one of the possible 
outputs. This classification system is discussed below in connection with Fig. 1 (a), 1 (b), 1 (c), and 1 (d). 
[001 2] The present invention also includes a neural system of classifying inputs that combines two subsystems. One 
30 subsystem counts the number of neurons that encompass a feature vector representing a particular input for each of 
the possible outputs, If one of the possible outputs has more neurons encompassing the feature vectorthan any other 
possible output, then the system selects that possible output as corresponding to that Input. Othenwlse, the second 
subsystem finds the neuron that has the smallest value for a particular distance hieasure for that feature vector. If that 
value is less than a specified threshold then the system selects the output associated with that neuron as corresponding 
35 to the input. This neural system is discussed below in connection with Figs. 1 (a), 2, 3, and 4. 

CLASSIFICATION SYSTEM 

[0013] Referring now to Fig. 1 (a), there is shown a bitmap representation of a nominal letter "O". When the classifi- 
ed cation system of the present invention classifies optically acquired character images, each character Image to be 
classified may be represented by an input bitmap, an (m x n) image an-ay of binary values as shown in Fig. 1 (a). In a 
prefen-ed embodiment, the classification system of the present invention generates a vector In a k-dlmenslonal feature 
space from infomnation contained in each input bitmap. Each feature vector F has feature elements 9, where 0 5 fe- 
1 . The dimension of the feature space, k, may be any integer greater than one. Each feature element ^ is a real value 
45 corresponding to one of /f features derived from the input bitmap. 

[d014] The /f features may be derived from the input bitmap using conventional feature extraction functions, such 
as, for example, the Grid or Hadamard feature extraction function. The feature vector F represents a point in the k- 
dimensionai feature space. The feature elements (j are the components of feature vector F along the feature-space 
axes of the /f-dimensional feature space. For purposes of this specification, the tenn "feature vector" refers to a point 
so in feature space. 

[0015] in a preferred embodiment, a discriminant analysis transform may be applied to Grid-based or Hadamard- 
based feature vectors to define the feature space. In this embodiment, the separation between possible outputs may 
be Increased and the dimensionality of the feature vector may be reduced by perfomriing this discriminant analysis in 
which only the most significant Eigenvectors from the discriminant transfomnation are retained. 
55 [0016] The classification system of the present invention, compares a feature vector F, representing a particular input 
image, to a set of neurons in feature space, where each neuron is a closed /c-dimensional region or "hyper-volume" in 
the /f-dimensional feature space. For example, when (k = 2), each neuron is an area in a 2-dimensional feature space, 
and when (k = 3), each neuron is a volume in a 3-dimensional feature space. Fig. 2 show® a graphical depiction of an 



3 



EP1 197 914B1 



exemplary 2-dimensional feature space populated with eight 2-dimensionai neurons. 

[0017] In a preferred classification system according to the present invention, the boundary of at least one of the 
neurons populating a /f-dlmenslonal feature space is defined by at least two axes that have different lengths. Some of 
these neurons may be generally represented mathematically as: 



where define the center poirit of the neuron, are the lengths of the neuron axes, and m and A are positive real 
constants. In a preferred embodiment, at least two of the neuron axis are of different length. The values gfj that satisfy 
. Equation (1 ) define the points in feature space that lie within or on the boundary of the neuron. Those sidlled In the art 
15 will understand that other neurons within the scope of this invention may be represented by other mathematical ex- 
pressions. For exampje, a neuron may be defined by the expression: 



where the function "MAX" computes the maximum value of the ratio as y runs from 0 to k-1. Neurons defined by Equation 

(2) are hyper-rectangles. 

[0018] In a preferred embodiment of the present invention, the neurons are hyper-ellipses in the A'-dimenslonal feature 
space. A hyper-ellipse Is any hyper-volume defined by Equation (1 ), where (m=2) and (A=1 ). More particularly, a hyper- 
ellipse is defined by the function: 

y "^-^f .1, (3) 



where Cj define the hyper-eNipse center point, fcj are the hyper-ellipse axis lengths, and the values grj that satisfy Equation 
(3) define the points that lie within or on the hyper-eiiipse boundary. When all of the axes are the same length, the 
hyper-ellipse Is a hyper-sphere. In a preferred embodiment of the present invention, in at least one of the neurons, at 
least two of the axes are of different length. By way of example, there Is shown in Fig. 2 elliptical neuron 1 , having 
40 center point (Cq', c^I), and axis bg^, £>/ of different length. In a preferred embodiment, the axes of the neurons are 
aligned with the coordinate axes of the feature spaoe.Those skilled in the art will understand that other neurons having 
axes that do not all align with the feature-space axes are within the scojae of the Invention. 

[0019] According to the present invention, each neuron is associated with a particular possible output. For example, 
each neuron may correspond to one of the 26 capital letters. Each neuron is associated with only one of the possible 

45 outputs (e.g., letters), but each possible output may have one or more associated neurons. Furthermore, neurons may 
overlap one another in feature space. For example, as shown in Fig. 2, neurons 0, 1 , and 7 con-espond to the character 
"A", neurons 2, 3, 5, and 6 correspond to the character "B", and neuron 4 con-esponds to the character "C", Neurons 
1 and 7 overlap, as do neurons 2, 3, and 6 and neurons 3, 5, and 6. In an alternative embodiment (not shown), neurons 
corresponding to different possible outputs may overlap. The classification system of the present invention may employ 

50 the neurons of Fig. 2 to classify input images representative of the letters A, B, and C, 

[0020] Referring now to Fig. 3, there is shown a process flow diagram of classification system 300 for classifying an 
input (e.g., a bitmap of an optically acquired character image) as one of a set of possible outputs (e.g., characters) 
according to a preferred embodiment of the present invention, in the preferred embodiment shown in Fig. 3, the neurons 
in classification system 300 are processed in parallel. In an alternative embodiment (not shown), the neurons of cias- 

S5 sification system 300 may be processed in series. Means 302 is provided for receiving an input image bitmap and 
generating a feature vectorthat represents information contained in that bitmap. Means 304 and 306 are provided for 
comparing the feature vector generated by means 302 to a set of neurons, at least one of which has two or more axes 
of different length. Classification system 300 selects one of the possible outputs based upon that comparison. 
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[0021] In a preferred embodiment of the present invention, classification system 300 classifies optically acquired 
character bitmaps using a networl< of hyper-elliptical neurons. Means 302 of classification system 300 receives as 
input the bitmap of an optically acquired character image to be classified and generates a corresponding feature vector 
F. Means 304 then determines an "elliptical distance" as a function of the center and axes of each of the hyper- 
5 elliptical neurons x in the network and feature vector F, where: 



In Equation (4), and define the center point and axis lengths, respectively, of neuron x, where x runs from 0 to 
15 Enum"'' . 3nd fj are the elements of feature vector F. Those skilled in the art would recognize that distance measures 
different from that of Equation (4) may also be used. 

[0022] Means 306 determines which, if any, of the neurons encompass feature vector F. A neuron encompasses 
a feature vector -- and may be referred to as an "encompassing neuron" - if the feature vector lies inside the boundary 
that defines the neuron in feature space. For hyper-ellipses, neuron x encompasses feature vector F, if (r^ < 1) ■ If (ry, 
20 =1), feature vector Flies on the boundary of neuron x, and if (r^ > 1), feature vector Flies outside neuron x. Since 
neurons may overlap in feature space, a particular feature vector may be encompassed by more than one neuron. In 
Fig. 2, feature vector Fg, r corresponding to a particular input image, is encompassed by neurons 2 and 6. Alternatively, 
a feature vector may lie inside no neurons, as in the case of feature vector F^ of. Fig. 2, which corresponds to a different 
input image. 

25 [0023] Means 308 finds the "closest" neuron for each possible output. As described earlier, each neuron is associated 
with one and only one possible output, but each possible output may have one or more neurons associated with it. 
Means 308 analyzes all of the neurons associated with each possible output and determines the neuron "closest" to 
feature vector F for that output. The "closest" neuron will be the one having the smallest "distance" measure value r^. 
In the example of feature vector Fg of Fig. 2, means 308 will select neuron 1 as being the "closest" neuron to feature 
. 30 vector Fg for the character "A". It will also select neuron 2 as the "closest" neuron for character "B" and neuron 4 for 
character "C". 

[0024] Means 310 in Fig. 3 counts votes for each possible output. In a first preferred embodiment, each neuron that 
encompasses feature vector Fis treated by means 310 as a single "vote" forthe output associated with that neuron. 
In an alternatiye preferred embodiment discussed in greater detail with respect to Equation (7) belovy, each neuron 

35 that encompasses feature vector Fis treated by means 31 0 as representing a "weighted vote" forthe output associated 
with that neuron, where the weight associated with any particular neuron is a function of the number of training input 
feature vectors encompassed by that neuron. In a preferred embodiment, means 310 implements proportional voting, 
where the weighted vote for a particular neuron is equai the number of feature vectors encompassed by that neuron. 
For each possible output, means 310 tallies all the votes for all the neurons that encompass feature vector F. There 

40 are three potential types of voting outcomes: either (1 ) one output character receives more votes than any other output 
character, (2) two or more output characters tie forthe most votes, or (3) all output characters receive no votes, indicating 
the situation where no neurons encompass feature vector F, In Fig. 2, feature vector Fg may result in the first type of 
voting outcome: character "B" may receive 2 votes corresponding to encompassing neurons 2 and 6, while characters 
"A" and "C" receive no votes. Feature vector F^ of Fig. 2 results in the third type of voting outcome with each character 

45 receiving no votes. 

[0025] Means 31 2 determines if the first type of voting outcome resulted from the application of means 310 to feature 
vector F. If only one of the possible output characters received the most votes, then means 312 directs the processing 
of classification system 300 to means 31 4, which selects that output character as Corresponding to the input character 
bitmap. Otherwise, processing continues to means 316. For feature vector Fg in Fig. 2, means 312 determines that 
so character "B" has more votes than any other character and directs means 314 to select "B" as the character corre- 
sponding to feature vector Fg. For feature vector F(, in Fig, 2, means 31 2 determines that no single character received 
the most votes and directs processing to means 316. 

[0026] Means 316 acts asa tie-breaker for the second and third potential voting outcome in which no outright vote- 
leader exists, either because of a tie or because the feature vector lies inside no neurons. To break the tie, means 316 
5s selects that neuron xwhich is "closest" in elliptical distance to feature vector Fand compares r^to a specified threshold 
value 6"^. If (r^ < 6"^), then means 318 selects the output character associated with neuron x as corresponding to the 
input character bitmap. Otherwise, the tie is not broken and classification system 300 selects no character for the input 
icnage. A "norcharacter-selected" result is one of the possible outputs from classification system 300. For example, if 
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classification system 300 is designed to recognize capital letters and the input image corresponds to tlie number "7", 
a no-chiaracter-selected result Is an appropriate output. 

[0027] Tlireshold value ff" may be any number greater than 1 and is preferably about 1 .25. As described earlier, 
when feature vector F is inside neuron x, then (r^ < 1), and when feature vector Fis outside neuron x, then (r^ > 1). if 

5 the voting result from means 310 is a tie for.the most non-zero votes, then means 31 6 will select the output character 
associated with the encompassing neuron liaving a center which is "closest" in elliptical "distance" feature vector F. 
Alternatively, if there are no encompassing neurons, means 316 may still classify the input bitmap as corresponding 
to the output character associated with the "closest" neuron X, if (/•>, < 6"^). Using a threshold value Q"' of about 1 .25 
establishes a region surrounding each neuron used by means 316 for tie-breaking. In Fig. 2, feature vector Fh will be 

10 classified as character "C" if the "distance" measure r^^ is less than the threshold value 6"^; otherwise, no character is 
selected. 

[0028] Referring now to Fig. 4, there is shown a schematic diagram of classification system 400 of the present in- 
vention for classifying inputs as corresponding to a set of s possible outputs. Classification system 400 may perform 
part of the processing perfomned by classification system 300 of Fig. 3. Classification system 400 accepts feature 

15 vector F, represented by feature elements (/q, f^ and generates values and that act as pointers and/or 

flags to indicate the possible output to be selected. Classification system 400 includes four subsystem ieveis:,input 
level 402, processing level 404, output level 406, and postprocessing level 408. 

[0029] Input level 402 includes the set lot k input processing units /j, where j mns from 0 to k-^^ Each input processing 
unit /j receives as input one and only one element ^ of the feature vector F and broadcasts this value to processing 

20 |evel404. Input level 402 functions as a set of pass-through, broadcasting elements. 

[0030] Processing level 404 includes the set Eof elliptical processing units e,,, where xruns from 0 to E„„^-^ . 
Each elliptical processing unit is connected to and recejves input from the output of every input processing unit ^ of 
input level 402. Elliptical processing unit implements Equation (4) for neuron x of classification system 300 of Fig. 
3. Lil<e neuron x of classification system 300, each elliptical processing unit is defined by two vectors of internal 

2s parameters: B' and C?. The elements of vector B* are the lengths of the axes of neuron x, where: 

B''=(bo^b/ bl^^ . (5) 

30 and the elements of vector O are the coordinates of the center point of neuron x, where: 

C'=(Co',c/,...,cJ.i)''. (6) 

3^ [0031] Each ellipticar processing unit of processing level 404 computes the distance measure from feature 
vector Fto the center of neuron x. Processing level 404 is associaited with means 304 of classification system 300. If 
(r^ < 1), then elliptical processing unit is said to be activated; otherwise, elliptical processing unit is not activated. 
In other words, elliptical processing unit is activated when neuron x encompasses feature vector F. Each elliptical 
processing unit broadcasts the computed distance measure r^to only two output processing units of output level 408. 

40 [0032] Output level 406 includes two parts: output-total part 410 and output-minimize part 412. Output-total part 410 
contains the set C of s output processing units 0^', and output-minimize part 412 contains the set Cn- of s output 
processing units 0„"\, where nruns from 0 to s-l , where s is also the number of possible outputsfor which classification 
system 400 has been trained. For example, when classifying capital letters, s=26. Each processing unit pair {o„^,o„'^) 
is associated with only one possible output and vice versa. 

4S [0033] Each elliptical processing unit e,, of processing level 404 is connected to and provides output to only one 
output processing unit of output-total part 410 and to only one output processing unit o„"' of output-minimize part 
41 2. However, each output processing unit Oj} and each output processing unit o„"' may be connected to and receive 
input from one or more elliptical processing units ^ of processing level 404. These relationships are represented by 
connection matrices ]M and W", both of which are of dimension (s x J. In a preferred embodiment, if there is a 

50 connection between elliptical processing unit of processing level 400 and output processing unit 0^' of output-total 
part 410 of output level 406, an entry Wpx' connection matrix W will have a value that is equal to the number of 
training input feature vectors encompassed by neuron x; othenwise, it has value 0. In a further preferred embodirfient, 
entry w^^'has a value 1 if there is a connection between elliptical processing unit e^and output processing unit 0„. 
[0034] Connection matrix represents the connections between processing level 404 and output-minimize part 

55 41 2 of output level 406 and is related to connection matrix W. An entry W^"' in connection matrix \AA" will have a value 
of 1 for every entry W„^^ in connection matrix l/l*that is not zero. Othenwise, entry' W„^<^ will have a value of 0. 
[0035] Each output processing unit On' in output-total part 410 computes an output value On*, where: 



6 



EP1 197 914 B1 



C7) 



5 



where the function T(rJ returns the value 0 if (r^> 1); otherwise, it returns the value 1. In other words, the function T 
(rj returns the value 1 If elliptical processing unit of processing level 404 is activated. Output processing unit O,,* 
10 counts the votes for the possible output with which it is associated and outputs the total. Output-total part 410 of output 
level 406 is associated with means 306 and means 310 of classification system 300. 

[0036] Similar1y,.eachoutputprocesslng unit On '"ir>butput-minimizepart412computes an outputvalue Op where: 



where the function "MIN" returns the minimum value o1[w„^">r^) over all the elliptical processing units Therefore, 
each output processing unit examines each of the elliptical processing units to which it is connected and outputs 
a real value equal to the minimum output value from these elliptical processing unite. Output-minimize part 4l2 of 
output level 406 is associated with means 308 of classification system 300. 

25 . [0037] Postprocessing level 408 includes two postprocessing units and p^. Postprocessing unit p' is connected 
to and receives input from every output processing unit of output-total part 410 of output level 406. Postprocessing 
unit pf finds the output processing unit Op' that has the maximum output value and generates the value q*. If output 
processing unit Op' of output-total part 410 has an output value greater than those of all the other output processing 
units of output-total part 410, then the value cf is set to n ~ the Index for that output processing unit. For example, 

30 when classifying capital letters n may be O for "A" and 1 for"B", etc. Otherwise, theyalue <f is set to -1 to indicate that 
output-total part 410 of output level 406 did not classify the input, Postprocessing unit p* of postprocessing level 408 
is associated with means 312 of classification system 300. 

[0038] Similarly, postprocessing unit ff" - the other postprocessing unit in postprocessing level 408 - is connected 
to and receives input from every output processing unit o^"* of output-minimize part 412 of output level 406. Post- 
35 processing unit p™ finds the output processing unit o^™ that has the minimum output value and generates the value 
cf. If output processing unit o„'" of output-minimize part 412 has an output value less than a specified threshold Q"^, 
then the value cf is set to the corresponding index n. Otherwise, the value is set to -1 to indicate that output- 
minimize part 412 of output level 406 did not classify the input, because the feature vector F is outside the threshold 
region surrounding neuron xfor all neurons x. The threshold G"" may be the same threshold Q"> used in classification 
40 system 300 of Fig. 3. Postprocessing unit P™ of postprocessing level 408 is assocfated with means 31 6 of classification 
system 300. 

[0039] Classification of the Iriput Is completed by analyzing the values cf and cf". If (^+-^ ), then the input is classified 
as possible output of the set of s possible outpute. If (q'=-1 ) and {cf±-A ), then the input is classified as possible 
output cf" of the set of s possible outpute. Otherwise, if both values are -1 , then the input is not classified as any of thie 
45 s possible outputs. • 

TRAINING SYSTEM 

[0040] A neural networl< m ust be trained before it may be used to classify inpute. The training system of the present 
50 invention perfonns this required training by generating at least one non-spherical neuron in the /r-dimensional feature 
space. The training system is preferably implemented off line prior to the deployment of a classification system. 
[0041] The training system of the present invention generates neurons based upon a set of training inputs, where 
each training input is known to correspond to one of the possible outputs in the classification set. Continuing with the 
example of capital letters used to describe classification system 300, each training input may be a bitmap corresponding 
ss to one of the characters from "A" to "Z". Each character must be represented by at least one training input, although 
typically 250 to 750 training inputs are used for each character. 

[0042] Referring now to Fig. 5, there is shown a process flow diagram of training system 500 for generating neurons 
in /t-dimensional feature space that may be used in classification system 300 of Fig. 3 or in classification system 400 
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of Fig. 4. For example, when training for output classification, training system 500 sequentially processes a set of 
training bitmap inputs corresponding to known outputs. At a particular point In tine training, there will be a set of existing 
feature vectors that correspond to the training inputs previously processed and a set of existing neurons that have 
been generated from those existing feature vectors. For each training input, training system 500 generates a feature 

5 vector in a feature space that represents information contained in that training input. 

[0043] Training system 500 applies two rules in processing each training Input. The first training rule is that if the 
feature vector, corresponding to the training input currently being processed, is encompassed by any existing neurons 
that are associated with a different l<nown output, then the boundaries of those existing neurons are spatially adjusted 
to exclude that feature vector - that is, to ensure that that feature vector is not inside the boundary of those existing 

10 neurons. Otherwise, neurons are not spatially adjusted. For example. If the cun-ent training input corresponds to the 
character "R" and the feature vector corresponding to that training input is encompassed by two existing "P" neurons 
and one existing "B" neuron, then the boundaries of these three existing neurons are spatially adjusted to ensure they 
do not encompass the current feature vector. 

[0044] The second training rule is that ifthe current feature vector is not encompassed by at least one existing neuron 
15 that is associated with the same known output, then a new neuron Is created. Otherwise, no new neuron is created 
forthe current feature vector. For example, if the current training input corresponds to the character "W" and the feature 
vector corresponding to that training input is not encompassed by any existing neuron that is associated with the 
character "W", then a new"W" neuron is created to encompass that current feature vector. In a preferred embodiment, 
a new neuron is created by generating a temporary hyper-spherical neuron and then spatially adjusting that temporary 
20 neuron to create the new neuron, in an alternative preferred embodiment, the temporary neuron may be a non-spherlcai 
hyper-ellipse. 

[0045] In a preferred embodiment of the present invention, training system 500 generates hyper-elliptical neurons 
from a set of training bitmap inputs corresponding to known characters. Trainiriig system 500 starts with no existing 
feature vectors and no existing neurons. Processing of training system 500 begins with means 502 which selects as 
25 the current training input a first training input from a set of training inputs. IVIeans 504 generates the feature vector F 
that corresponds to the current training input. 

[0046] Whenthefirsttraininginputisthecurrenttraininginput.therearenoexistingneuronsandthereforenoexisting 
neurons that encompass feature vector F. in that case, processing of training system 500 flows to means 514 which 
creates a new neuron centered on feature Vector F. The new neuron is preferably defined by Equation (3), where all 

30 the new neuron axes are set to the same length, that is, (bf=X) for all / Since the new neuron axes are all the same 
length, the new neuron is a hyper-sphere In feature space of radius X. in a prefen-ed embodiment, the value.of constant 
\ may be twice as large as the largest feature element tj of all the feature vectors Ffor the entire set of training inputs. 
Since there are no existing feature vectors when processing the first training input, training system 500 next flows to 
means 528 from which point the processing of training system 500 may be described more generally. 

35 [0047] iVIeans 528 determines whether the current training input is the last training input in the set of training inputs, 
if not, then means 528 directs processing of training system 500 to means 530 which selects the next training Input 
as the current training input. IVIeans 504 then generates the feature vector Fcorrespondingto the current training input. 
[0048] Means 506 and 508 detemnine which, if any, existing neurons are to be spatially adjusted to avoid encom- 
passing feature vector F. in a preferred embodiment, means 510 adjusts an existing neuron if that neuron is not as- 

40 sociated with the same known character as the current training input (as detemiined by means 506) and if it encom- 
passes feature vector F (as detennlned by means 508). Means 508 determines if an existing neuron encompasses 
feature vector F by calculating and testing the "distance" measure of Equation (4) and testing whether (r^ < 1 ) as 
described earlier. 

[0049] in a preferred embodiment, means 510 spatially adjusts an existing neuron by optimally shrinking it along 
45 only one axis, in another preferred embodiment, means 510 shrinks an existing neuron proportionally along 'one or 
more axes. These shrinking methods are explained in greater detail later in this specification. After processing by 
means 51 0, the current feature vector is not encompassed by any existing neu rons that are associated with a character 
which is different from the character associated with the training input. Hence, the current feature vector lies either 
outside or on the boundaries of such existing neurons. 
50 [0050] Training system 500 also detemilnes if a new neuron- Is to be created and, if so, creates that new neuron. A 
new neuron is created (by means 514) if the feature vector Fis not encompassed by any existing neuron associated 
with the same character as the training input (as determined by means 512). As described above, means 51 4 creates 
a new neuron that is, preferably, a hyper-sphere of radius X. 

[0051] Training system 500 then tests and, if necessary, spatially adjusts each new neuron created by means 514 
55 to ensure that it does not encompass any existing feature vectors that are associated with a character which is different 
from the character associated with the training input. Means 516, 524, and 526 control the sequence of testing a new 
neuron against each of the existing feature vectors by selecting one of the existing feature vectors at a time, if a new 
neuron is associated with a character different from that of the currently selected existing feature vector (as determined 
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by means 518) and if the new neuron encompasses that selected existing feature vector (as determined by means 
520 using Equation (4)), then means 524 spatially adjusts the new neuron by one of the same shrini«ng algorithms 
employed by means 510. Training system 500 continues to test and adjust a new neuron until all existing feature 
vectors have been processed. Since the hyper-spherical neuron created by means 51 4 is adjusted by means 522, that 
5 hyper-spherical neuron Is a temporary neuron with temporary neuron axes of equal length. Processing of training 
system 500 then continues to means 528 to control the selection of the next training input. 

[0052] In a preferred embodiment, the steps of (1) shrinking existing neurons for a given Input, and (2) creating and 
shrinking a new neuron created for that same input may be perfonned in parallel. Those skilled in the art will understand 
that these two steps may also be perfonned sequentially in either order. 

10 [0053] In a preferred embodiment, after ail of the training inputs in the set of training inputs have been processed 
sequentially, means 528 directs processing of training system 500 to means 532. After processing a set of training 
inputs with their corresponding feature vectors, feature space is populated with both feature vectors and neurons. After 
processing the set of training inputs one time, some feature vectors may not be encompassed by any neurons. This 
occurs when feature vectors, that were, at some point in the training process, encompassed by neuron(s) of the isame 

15 character, become excluded from those neurons when those neurons were shrunk to avoid subsequent feature vectors 
associated with a different character. In such a situation, means 532 directs processing to return to means 502 to 
repeat processing of the entire set of training inputs. When repeating this processing, the previously created neurons 
are retained. By Iteratlvely repeating this training process, new neurons are created with each Iteration until eventually 
each and every feature vector is encompassed by one or more neurons that are associated with the proper output and 

20 no feature vectors are encompassed by neurons associated With different possible outputs. IVIoreover, this iterative 
training is guaranteed to converge in a finite period of time with the maximum number of iterations being equal to the 
total number of training inputs. 

[0054] After training system 500 completes Its processing, the feature space is populated with neurons that may 
then be used by characterization system 300 or characterization system 400 to classify an unknown input into one of 

25 a plurality of possible outputs. 



OPTIIVIAL ONE-AXIS SHRINKING 



[0055] As mentioned earlier, in a preferred embodiment, training system 500 spatially adjusts the boundary of a 
hyper-elliptical neuron to exclude a particular feature vector by optimally shrinking along one axis. Means 510 and 522 
of training system 500 may perform this one-axis shrinking by (1 ) identifying the axis to shrink, and (2) calculating the 
new length for that axis. 

[0056] Training system 500 identifies the axis n to shrink by the formula: 



'Jc-1 



Jc-1 



where the function "argmax" retums the value of /that maximizes the expression in the square brackets for any /from 
0 to /c-1 ; Cj and define the center point and axis lengths, respectively, of the neuron to be adjusted; and /J define the 
feature vector to be excluded by that neuron. 

[0057] training system 500 then calculates the new length 6n' for axis n by the equation: 
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Jc-1 • 



In one-axis shrinking, all other axes retain their original lengths fij. 

[0058] One-axis shrinking pf an original hyper-elliptical neuron according to Equations (9) and (1 0) results in an 
adjusted neuron with the greatest hyper-volume V that satisfies the following four criteria: 

15 (1) The adjusted neuron is a hyper-ellipse; 

(2) The center point of the original neuron is the same as the center point of the adjusted neuron; 

(3) The feature vector to be excluded lies on the boundary of the adjusted neuron; and 

(4) All points within or on the boundary of the adjusted neuron lie within or on the boundary of the original neuron. 

20 The hyper-volume V is defined by: 



Jc-1 

where q, is a constant that depends on the value of k, where k Is the dimension of the feature space, and are the 
lengths of the axes defining the adjusted neuron. One-axis shririking, therefore, provides a-flrst method for optimally 
30 adjusting neurons according to the present invention. 

PROPOFUIONAL SHRINKING ALGORITHM 

[0059] In alternative preferred embodiment, training system 500 spatially adjusts the boundary of a hyper-elliptical 
35 neuron to exclude a particular feature vector by shrinking proportionally along one or more axes. Means 510 and 522 
of training system 500 may jserform proportional shrinking by calculating the vector AB of axis length changes At^, 
where: 



■ AB = (Abo,Abi Ab^-i C3) 

= Vector of Cosines, . ' (14) 

IF-GI^=(lfo-Co,l^rCil l/k.rq.il), (15) 

F={fo,f, f^.^f, . ' (16) 
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(17) 



jbo 0 .0 ... 

0 2?j 0 - 

0 0 i>2 - 

0 0 0 - 



f 0 0 0 .. 
0 fj. 0 

6 0 f J - 



0 0 0 fjf. 



(19) 



0 .- 
0 ... 



0 0 0 - Cu. 



r = (Yo.Yi.-Yfc-i). (21) 

where Iffl-cbl is the absolute value of (/p-cb); ilF-CII is the magnitude of the vector difference between Fand C; q and 
fcj define the.center point and axis lengths, respectively, of the neuron to be adjusted; ij are the elements of the feature 
vectorto be excluded from that neuron; and a and 7j may be constants. The new axis lengths fr/forthe adjusted neuron 
are calculated by : 

ft,^=b| + Ato| (22) 

foryfrom Oto 7f-1. 

[0060] In proportional shrinlting, training system 500 detennines the projections of a vector onto the axes of the 
neuron to be adjusted, where the vector points from the center of that neuron to the feature vector to be excluded. 
These projections are represented by the vector of cosines of Equation (14). Training system 500 then detennines 
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how much to shrink each neuron axis based on the relationship between the length of the axis and the length of the 

projection onto that axis. 

[0061] In a preferred embodiment, the constant a in Equation (12) is selected to be less than 1 . In this case, training 
system 500 may perform Iterative shrinking, where the neuron is slowly adjusted over multiple axis-shrinking steps 
5 until It Is determined that the feature vector to be excluded is outside the adjusted neuron. In a preferred embodiment, 
parameter may be set to a positive value that is roughly 0.001 times the size of axis / to ensure that proportional 
shrinking eventually places the feature vector outside the neuron. In an alternative preferred embodiment, the param- 
eters Yj may be error functions based on the distance from the feature vector to the boundary of the slowly adjusted 
neuron. In such case, training system 500 may operate as a proportional integral controller for adjusting neurons. 

10 

ORDERING OF TRAINING INPUTS 

[0b62] In a prefen-ed embodiment of the present invention, the set of training Inputs, used sequentially by the training 
system to generate neurons, may be organized according to input quality. The training inputs may be ordered to train 

15 with higher quality inputs before proceeding to those of lower quality. This quality ordering of training inputs ensures 
that neurons are centered about feature vectors that con-espond to inputs of higher quality. Such ordered training may 
improve the perfonnance efficiency of a classification system by reducing the numbers of neurons needed to define 
the classification system. Such ordering may also reduce the numbers of misclassifications and non-classifications 
made by the classification system. A misclassification is when a classification system selects one possible output when, 

20 In truth, the input corresponds to a different possible output, A non-classification Is when a classification system fails 
to select one of the known outputs and instead outputs a noroutput-selected result. 

[0063] Referring now to Figs. 1 (a), 1 (b), 1 (c), and 1 (d), there are shown bitmap representations of a nominal letter 
"O", a degraded letter "G", a nominal number "7", and a degraded letter "7", respectively. A nominal Input Is an ideal 
Input with no noise associated with it. A degraded input is one in which noise has created deviations from the nominal 

25 input, Degraded Inputs may result from either controlled noise or real unpredictable noise. 

[0064] In a preferred embodiment, the training system of the present invention may train with training inputs of three 
different quality levels. The first level of training inputs are nominal inputs like those presented in Figs, 1(a) and 1 (c). 
The second level of training inputs are controlled noise inputs, a type of degraded input created by applying defined 
noise functions or signals with different characteristics, either independently or in combination, to nominal Inputs, The 

30 third ieyel of training inputs are real noise inputs, a second type of degraded inputs which. In the case of characters, 
may be optically acquired images of known characters. Such degraded Inputs have real unpredictable noise. Figs, 1 
(b) and 1 (d) present representations of possible controlled noise Inputs and real noise inputs. In a preferred embodi- 
ment, the nominal inputs have the highest quality, with the controlled noise inputs and real noise inputs of decreasing 
lesser quality. Depending upon the controlled noise functions and signals applied, a particular controlled-noise Input 

35 may be of greater or lessor quality than a particular real-noise input. ' 

[0065] The quality of a particular degraded input- of either controlled-noise or real-noise variety - maybe determined 
by comparing the degraded input to a nominal input corresponding to the same known character, In a preferred em- 
bodiment, a quality measure may be based on the nuhiber of pixels that differ between the two inputs. In another, 
preferred embodiment, the quality measure may be based on conventional feature measures such as Grid or Hadamard 

40 features, 

[0066] In a prefen-ed embodiment, training. systems of the present invention train first with the nominal Inputs and 
then later with degraded controlled-noise and real-noise Inputs. In this preferred embodiment, training with inputs 
corresponding to Figs, 1 (a) and 1 (c) would precede training with those.of Figs. 1 (b) and 1 (d). In another preferred 
embodiment, the training system trains with all inputs of the same known character prior to proceeding to the next 
45 known character, and the training inputs of each known character are internally organized by quality. In this preferred 
embodiment, training with Fig. 1 (a) proceeds that with Fig. 1 (b), and training with Fig. 1 (c) proceeds that with Fig. 1 
(d).'Those skilled in the art will understand that the exact overall sequence of training with all of the Inputs is of lessor 
Importance than ordering of inputs by quality for each different known character. 

50 REFINEMENT OF NEURONS 

[0067] After the training system of the present invention has completed training, the feature space is populated with 
neurons that encompass feature vectors, with one feature vector corresponding to each distinct training input. Each 
neuron may encompass one or more feature vectors - the one at the center of the neuron that was used to create the 
S5 neuron and possibly other feature vectors corresponding to inputs associated with the same known character 

[0068] Depending upon the quality ordering of the training inputs used In the sequential training, a particular neuron 
may encompass those feature vectors in a more or less efficient manner. For example, If the feature vector used to 
create a particular neuron corresponds to a highly degraded input, then that feature vector will lie at the center of that 
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neuron. That same neuron may also encompass other feature vectors corresponding to nominal inputs and inputs of 
lessor degradation. Such a neuron may not be the most efficient neuron for encompassing that set of feature vectors. 
A classification system using such a neuron may mal<e more misclassifications and non-classifications than one using 
a more efficient neuron. 

5 [0069] A refinement system of the present invention spatially adjusts neurons, created during training, to create more 
efficient neurons. This refinement system may characterize the spatial distribution of feature vectors encompassed by 
a particular neuron and then spatially adjust that neuron. Such spatial adjustment may involve translating the neuron 
from its current center point toward the mean of the spatial distribution of those feature vectors. After translating the 
neuron, the axis lengths may be adjusted to ensure that feature vectors of the same output character are encompassed 

10 by the neuron and to ensure that feature vectors of different outpuj character are excluded. 

[0070] In an alternative embodiment, the refinement system may spatially adjust two or more neurons of the same 
character to create one or more neurons that more efficiently encompass the same feature vectors, where a feature 
vector from one original neuron may be encompassed by a different more efficient neuron. For example, before re- 
finement, a first neuron may encompass feature vectors Fp F2, and F3, and a second neuron may encompass feature 

15 vectors F4, Fg, Fg, and F7. After refinement, feature vectors F,, Fg, F3, and F4 may be encompassed by a third neuron, 
and feature vectors Fg, Fg, and F7 may be encompassed by a fourth neuron, where the centers and axis lengths of 
the third and fourth neurons are all different from those of the first and second neurons. 

CLASSIFYING SYSTEIViS WITH CLUSTER CLASSIFIERS 

20 

[0071] In a first prefen-ed embodiment of the present invention, a classification system classifies inputs into one of 
a set of possible outputs by comparing the feature vector, for each input to be classified, with every neuron in the 
feature space. Such classification systems are presented in Figs. 3 and 4. 

[0072] Referring now to Fig. 6, there is shown classification system 600 - a second preferred embodiment of the 

25 present invention - in which inputs are classified into one of a set of possible outputs using neurons and cluster 

classifiers. Classification system 600 includes top-level classifier 602 and two or more cluster classifiers 604, 606 

608. Top-level classifier 602 classifies inputs into appropriate clusters of inputs. For example, where classification 
system 600 classifies characters, top-level classifier 602 may classify input bitmaps corresponding to optically acquired 
characters into clusters of characters, 

30 [0073] The characters clustered together may be those represented by similar bitmaps, or, in other words, those 
characters associated with feature vectors close to one another in feature space. For example, a first character cluster 
may correspond to the characters "D", "P", "R" and "B". A second charactercluster may correspond to the characters 
"O", "C", "D", "U", and "Q". A third cluster may correspond to only one character such as the character "Z". A particular 
character may be In more than one character cluster. In this example, the character "D" is in both the first and the 

35 second character clusters, because its bitmaps are similar to those of both clusters. 

[0074] In a preferred embodiment, before training, characters are clustered based on a confusion matrix. The con- 
fusion matrix represents the likelihood that one character will be confused with another character for every possible 
pair of characters. In general, the closer the feature vectors of one character are to those of another character, the 
higher the likelihood that those two characters may be confused. For example, the character "D" may have a highel" 

40 confusion likelihood with respect to the "O" than to the "IVI", if the feature vectors for "D" ar'e closerto the feature vectors 
for "O" than to those for "M". 

[0075] In a preferred embodiment, the clustering of characters is based upon a conventional K-Means Clustering 
Algorithm, in which a set of templates is specified for each character, where each template Is a point in feature space, • 
The K-Means Clustering Algorithm determines where in feature space to locate the templates for a particular character 

45 by analyzing the locations of the feature vectors for all of the training inputs corresponding to that character. Templates 
are preferably positioned near the arithmetic means of clusters of associated feature vectors. 
[0076] In a preferred embodiment, four templates may be used for each character and the number of characters per 
cluster may be roughly even. For example, when classifying the 64 characters con-esponding to the 26 capital and 26 
lower-case letters, the 1 0 digits, and the symbols "&" and "#", 4x64 or 256 templates may be used to define 7 different 

so clusters of roughly equivalent numbers of characters. 

[0077] By clustering characters, top-level classifier 602 may implement a classification algorithm that quickly and 
accurately detennines the appropriate cluster for each input. In a prefen-ed embodimerit, top-level classifier 602 im- 
plernents a neuron-based classification algorithm. In another prefen-ed embodiment, other conventional non-neural 
classification algorithms may be performed by top-level classifier 602. Top-level classifier 602 selects the appropriate 

55 cluster for a particular input and directs processing to continue to the appropriate cluster classifier 604, 606 608. 

Each cluster classifier is associated with one and only one character cluster, and vice versa. 

[0078] In one preferred embodiment, each cluster classifier may implement a classrfication algorithm unique to that 

character cluster, or shared by only a subset of the total number of character clusters. Each cluster classifier may 
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therefore employ neurons that exist In a feature space unique to that character cluster. For example, training for the 
"P", "R", "B" cluster may employ a particular set -of Grid features, while training for the "O", °C", "D",."U", "Q" cluster 
may employ a different set of Hadamard features. In that case, different training procedures are performed for each 
different cluster classifier, where only inputs con-esponding to those characters of the associated cluster are used for 

5 each different training procedure. 

[0079] In a third preferred embodiment of the present Invention, a classification system according to Fig. 6 may 
classify Inputs into one of a set of possible outputs using neurons and cluster classifiers. In this third embodiment, top- 
level classifier 602 identifies the template in feature space closest to the feature vector for the cun-ent input to be 
classified. The identified template is-associated with a particular character that belongs to one or more character clus- 

10 ters. The top-level classifier 602 directs processing to only those cluster classifiers 604, 606 608 associated with 

the character clusters of the closest template. Since a particular character may be in more than one character cluster, 
more than one cluster classifier may. be selected by top-level classifier 602 for processing. 

[0080] In afpurth preferred embodiment, each clusterclassiflerroay have a decision tree that Identifies those neurons 
that should be processed for a given input. Prior to classiifying, feature vector space for a particular cluster classifier 
15 may be divided Into regions according to the distribution of feature vectors and/or neurons in feature space. Each 
region contains one or more neurons, each neuron may belong to more than one region, and two or more regions may 
overiap. Top-level classifier 602 may detennine in which feature-space region (or regions) the feature vector for the 
current input lies and may direct the selected cluster classifiers to process only those neurons associated with the 
region (or those regions). 

20 [0081] Those skilled in the art will understand that some classification systems of the present invention may use 
decision trees without cluster classifiers, some may use cluster classifiers without decision trees, some may use both, 
and others may use neither. Those skilled in the art will further understand that decision trees and cluster classifiers 
may increase the efficiency of classification systems of the present invention by reducing processing time. 

2S PREFERRED AND ALTERNATIVE PREFERRED EMBODIMENTS 

[0082] Those skilled In the art will understand. that classifying systems of the present invention may be arranged in 
series or parallel. For example, in a preferred embodiment, a first character classifier based on Grid features may be 
arranged in series with a second character classifier based on Hadamard features. In such case, the first classifier 
30 classifies a particular bitmap input as one of the known characters orit fails to classify that input. If it fails to .classify, 
then the second classifier attempts to classify that input. ■ ' 

[0083] In an alternative embodiment, two or more different classifiers may be arranged in parallel. In such case, a 
voting scheme may be employed to select the appropriate output by comparing the outputs of each different classifier. 
[0084] 1 n a preferred embodiment, classification systems and training systems of the present invention perform par- 
as allel processing, where each elliptical processing unit may run on a separate computer processor during classification, 
although those skilled in the art will understand that these systems may also perfonn serial processing. In a preferred 
embodiment, the classification systems and training systems may reside in a reduced instruction set computer (RISC) 
processor such as a SPARC 2 processor running on a SPARCstatlon 2 marketed by Sun Microsystems. 
[0085] Those skilled In the art will understand that inputs other than character images may be classified with the 
40 classification systems of the present invention. In general, any input may be classified as being one of a set of two or 
more possible outputs, where a no-selection result is one of the possible outputs. For example, the classification sys- 
tems of the present Invention may be used to identify persons based-upon images of their faces, fingerprints, or even 
earlobes. Other classification systems of the present Invention may be used to identify people from recordings of their 

45 [0086] It will be further understood that various changes in the details, materials, and arrangements of the parts 
which have been described and illustrated in order to explain the nature of this invention may be made by those skilled 
in the art without departing from the scope of the invention as expressed in the following claims. 



50 Claims 

1 . A training method for creating a new neuron in a feature space having at least one existing neuron, comprising 
the steps of: 

55 (a) generating a feature vector (504) representative of a training input wherein said training input corresponds 

to one of a plurality of possible outputs; and 

(b) if no existing neuron corresponding to said training input encompasses said feature vector (512) then 
creating said new neuron; 
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characterised in tliat said new neuron comprises a boundary defined by two or more neuron axes of different 

lengtli; 

said feature space comprises an existing feature vector; 
and step (b) further comprises tine steps of: 

(i) creating (51 4) a temporary neuron comprising a boundary defined by two or more temporary neuron axes; 
and 

(ii) if said temporary neuron encompasses said existing feature vector (508) and said existing feature vector 
does not correspond to said training input (506) then spatially adjusting said temporary neuron to create said 
new neuron, and in that step (b) (11) comprises the steps of: 

(1) selecting at least one of said temporary neuron axes; 

(2) calculating the distances along each of said selected temporary neuron axes from the center of said 
temporary neuron to said existing feature vector; and 

(3) reducing (510) said selected temporary neuron axes by' amounts proportional to said distances and 
the lengths of said selected'temporary neuron axes to create said new neuron. 

The method of claim 1 , wherein step (b)(ii)(1 ) comprises the step of setectlng all of said temporary nfiuron axes 

The method of claim 1 , wherein In step (b)(li): 

(1 ) only one of said temporary neuron axes is selected; and 

(2) a distance In accordance with said selected temporary neuron axis and said existing feature vector is 

calculated; and 

(3) said selected temporary neuron axis is reduced by said distance to create said new neuron. 

The method of one of the claims 1 or 3, wherein said training input Is representative of a character. 

The method of one of the claims 1 or 3, wherein said information representative of said Input cortiprlses a feature 
vector 

The method of claim 5, wherein said feature vector comprises a feature element, wherein said feature element Is 
a Grid feature element or a Hadamard feature element. 

The method of one of the claims 1 or 3, wherein said neuron is a hyper-ellipse or a hyper-rectangle in al<-dimen- 
sional feature space, where k Is greater than or equal to two. 

A training apparatus for creating a new neuron in a feature space having at least one existing neuron, comprising: 

generating means (504) for generating a feature vector representative of a training input, wherein said training 
input corresponds to one of a plurality of possible outputs; and 

creating means (51 4) for creating said new neuron, if no existing neuron (506) corresponding to said training 
input encompasses said feature vector; 

characterised in that said new neuron comprises a boundary defined by two or more neuron axes of different 

length; 

said feature space comprises an existing feature vector; 

said creating means creates atemporary neuron comprising a boundary defined by two or more temporary neuron 
axes; and 

if said temporary neuron encompasses said existing feature vector (508) and said existingfeature vector does not 
correspond to s^id training input, then said creating means spatially adju'sts said temporary neuron to create said 
new neuron; and in that said creating means 

(1 ) selects at least one of said temporary neuron axes; 

(2) calculates the distances along each of said selected temporary neuron axes from the center of said tem- 
porary neuron to said existing feature vector; and 

(3) reduces said selected temporary neuron axes by amounts proportional to said distances and the lengths 
of said selected temporary neuron axes to create said new neuron. - 



15 



EP1 197 914B1 



9. The apparatus of claim 8, wherein said creating means selects all of said temporary neuron axes for said reduction. 

10. The apparatus of claim 8, wherein said creating means selects only one of said temporary neuron axes, said 
creating means calculates a distance in accordance with said selected temporary neuron axis and said existing 

5 feature vector, and said creating means reduces said selected temporary neuron axis by said distance to create 

said new neuron. 

11 . The apparatus of claim 8, wherein said training input is representative of a character. 

10 12. The apparatus of claim 6, wherein said infomiation representative of said Input comprises a feature vector 

13. The apparatus of claim 1 2, wherein said feature vector comprises a feature element, wherein said feature element 
Is a Grid feature element or a Hadamard feature element. 

IS 14. The apparatus of claim 1 2, wherein said comparing means detemnines whether said neuron encompasses said 
feature vector 

1 5. The aipparatus of claim 1 2, wherein said comparing means detennlnes a distance measure from said feature vector 
to said neuron. 

20 

16. The apparatus of claim 8, wherein said neuron is a hyper-ellipse or a hyper-rectangle in k-dimensional feature 
space, where k is greater than or equal to two. 



2S Patentanspriiche 

1. Trainingsverfahren zum Erzeugen eines neuen Neurons in einem Merkmalsraum, in dem sich zumindest schon 
©in Neuron beflndet, mit den Schritten: 

30 (a) Erzeugen eines Merkmalsvektors (504), der eine Trainingseingabe darstellt, wobei die Trainingseingabe 

einervon mehreren m6gllchen Ausgaben entspricht; und 

(b) wenn kein existierendes Neuron enlsprecherid der Trainingseingabe den Merkmalsvektor (512) ein- 
schlieBt, Erzeugen des neuen Neurons, 

35 dadurch gekennzelchnet, dass 

das neue Neuron eine durch zwei oder mehr unterschiedlich lange Neuronachsen definierte Grenze aufweist; 
der IVIerkmalsraum einen existierenden jVlefkmalsvektor aufweist; und 
Schritt (b) auBerdem die foigenden Schritte aufweist: 

40 (I) Erzeugen (51 4) eines zeitweiligen Neurons mit einer durch zwei Oder mehrere zeitweilige Neuronachsen 

. definierten Grenze; und 

(11) wenn das zeitweilige Neuron den existierenden IWerkmalsvektor (508) eInschlieBt und der existierende 
iWerkmalsvektor nicht der Trainingseingabe (506) entspricht, raumltohes Anpassen des zeitweiligen Neurons, 

urn das neue Neuron zu erzeugen, 
45 wobei der Schritt (b) (ii) folgende Schritte aufweist: 

(1) Wahlen zumindest einer der zeitweiligen Neuronachsen; 

(2) Berechnen des Abstands langs jeder der gewahlten zeitweiligen Neuronachsen vom Zentrurh des 
zeitweiligen Neurons zum existierenden IVIerkmalsvektor; und 

50 (3) Veri<leinern (510) der ausgewahlten zeitweiligen Neuronachsen um Betrage proportional ZU den Ab- 

stSnden und den Langen der ausgewahlten zeitweiligen Neuronachsen, um das neue Neuron zu erzeu- 
gen. 

2. Verfahren nach Anspruch 1 , bei dem der Schritt (b) (ii) (1) den Schritt des Auswahlens aller zeitweiligen Neuron- 

55 achsen aufweist. 

3. Verfahren nach Anspruch 1, bei dem der Schritt (b)(ii) aufweist: 
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(1) lediglich eine derzeitweiligen Neuronachsen wird ausgewahit; und 

(2) ein Abstand nach MaBgabe der ausgewahlten zeitweiligen Neuronachse und des existierenden Merkmals- 
vektors wird berechnet; und 

(3) die ausgewahit zeltwelllge Neuronachse wird zur Erzeugung des neuen Neurons urn den Abstand yerklei- 
5 nert. 

4. Verfahren nach einem der Anspriiche 1 oder 3, bei dem die Trainingseingabe einen Buchstaben darstellt. 

5. Verfahren nach Anspruch 1 oder 3, bel dem die die EIngabe darstellende Information eInen Merkmalsvektor auf- 

10 weist. 

6. Verfahren nach Anspruch 5, bei dem der Merkmalsvektor. ein Merkmalselement aufwelst, wobei das Merkmals- 
element ein Gittermerkmalselement oder ein Hadamard-Merkmalselement ist. 

15 7. Verfahren nach einem der Anspruche 1 oder 3, bel dem das Neuron eine Hyperellipse oder ein Hypen-echteck im 
k-dimenslonalen er Merkmalsraum ist, wobei k 2 2 ist. 

8. Trainingsvomchtung zur Erzeugung eines neuen Neurons im Merkmalsraum, derzumindest schon ein Neuron 
aufwelst, mit: 

20 • ■ 

einer Erzeugungseinrichtung (504) zum Erzeugen eines eIneTralningseingabedarstellenden Merkmalsvektor, 
wobei die Trainingseingabe einer von mehreren mogllchen Ausgaben entspricht; und 
einer Erzeugungseinrichtung (514) zum Erzeugen des neuen Neurons, wenn kein existlerendes Neuron ent- 
sprechend der Trainingseingabe den Merkmalsvektor eInschlieRt; 

25 

dadurch gekennzeichnet, dass 

das neue Neuron eine durch zwei oder mehrere unterschledlich lange Neuronachsen definlerte Grenze aufwelst; 
der Merkmalsraum einen existierenden Merkmalsvektor aufwelst; 

die Erzeugungseinrichtung ein zeltwelliges Neuron mit einer durch zwei oder mehrere zeilwellige Neuronachsen 
30 definierten Grenze erzeugt;- und 

wenn das zeitweilige Neuron den existierenden Merkmalsvektor (508) eInschlieBt und der existierende Merkmals- 
vektor nicht der Trainingseingabe entspricht, die Erzeugungseinrichtung zur Erzeugung des neuen Neurons das 
zeitweilige Neuron raumllch einstellt, wobei die Erzeugungseinrichtung 

35 (1) zumindest eine der zeitweiiigen Neuronachsen auswahit; 

(2) den Abstand langs jeder der ausgewahlten zeitweiiigen Neuronachsen von der MItte des zeitweiiigen Neu- 
. rons zum existierenden Merkmalsvektor berechnet; und 

(3) die ausgewahlten zeitweiiigen Neuronachsen um Betrage proportional zu den Abstanden und den Langen 
der ausgewahlten zeitweiiigen Neuronachsen verklelnert, um das neue Neuron zu erzeugen. 

40 

9. Vorrichtung nach Anspruch 8, bei der die Erzeugungseinrichtung alle zeitweiiigen Neuronachsen fur die Verklel- 
nerung auswShlt. 

10. Vorrichtung nach, Anspruch 8, bei der die Erzeugungseinrichtung lediglich eine der zeitweiligen Neuronachsen 
45 auswahit, eInen Abstand nach MaBgabe der ausgewahlten zeitweiiigen Neuronachse und des existierenden Merk- 

malsvektors berechnet und die ausgewahlte zeitweilige Neuronachse um den Abstand verklelnert, um das neue 
Neuron zu erzeugen. 

11 . Vorrichtung nach Anspruch 8, bel der die Trainingseingabe eirien Buchstaben darstellt. 

50 

12. Vorrichtung nach Anspruch 8, bei der die die EIngabe darstellende Infomnation einen Merkmalsvektor aufwelst. 

13. Vorrichtung nach Ansprucfi 1 2, bei der der Merkmalsvektor ein Merkmalselement aufwelst, wobei das Merkmals- 
element ein Gittemnerkmalselement oder ein Hadamard-Merionalselement ist. 

55 

14. Vorrichtung nach Anspruch 12, bei der die Vergleichseinrichtung bestimmt, ob das Neuron den Merkmalsvektor 

einschlieBt. 
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1 5. Vorrichtung nach Anspruch 1 2, bei der die Vergleichseinrichtung ein AbstandsmaB vom Merkmalsvektor zum Neu- 
ron bestlmmt, 

16. Vorrichtung nach Anspruch 8, bei der das Neuron eine Hyperellipse Oder ein Hyperrechtecic im l<-dimensionalen 
Merkmalsraum ist, wobei l< s 2 ist. 



Revendications 

1 . Methode d'apprentlssage pour creer un nouveau neurone dans un espace des fonctions comprenant au moins 
un neurone existant, comportant les Stapes suivantes: 

(a) g§n6ration d'un vecteur k fonctions (504) reprSsentatif d'une entree d'apprentissage, iadite entr6e d'ap- 
prentissage con-espondant a.l'une d'une piuralit§ de sorties possibies; et 

(b) si aucun neurone existant correspondant a iadite entr§e d'apprentissage Inciut iedit vecteur k fonctions 
(512), aiors creation dudit nouyeau neurone; 

caracterisee en ce que 

Iedit nouveau neurone comporte une iimite d§finie par deux axes de neurone ou plus de differentes longueurs; 
(edit espace des fonctions comprend lin vecteur k fonctions existant; et i'etape (b) comprend en outre les etapes 
sulvantes: 

(i) creation (514) d'un neUrone ten?iporaire comportant une iimite d6finie par deux axes de neurone temporaire 
ou plus; et • 

(il) si Iedit neurone temporaire inciut Iedit vecteur a fonctions (508). existant, et si iedit vecteur a fonctions 
existant ne con-espond pas k iadite entr§e d'apprentissage (506), aiors ajustement spatlai dudit neurone tem- 
poraire pour crSer iedit nouveau neurone, 

et en ce que I'Stape (b) (ii) comporte ies etapes suivantes: 

(1) seiection d'au rfioins i'un desdits axes de neurone temporaire; 

(2) caicui des distances ie Jong de chacun desdits axes de neurone ternporaire sSiectionnSs entre ie centre 
dudit neurone temporaire et ie vecteur & fonctions existant; et 

(3) reduction (51 0) desdits axes de neurone temporaire par des quantity proportlonneiles aux dites distances 
et aux longueurs desdits axes de neurone temporaire selectionnes pour creer ledit nouveau neurone, 

2. Methode selon la revendication 1 , dans iaqueiie i'Stape (b) (ii) (1) comporte i'6tape de sSlectionner tous lesdites 
axes de neurone temporaire. 

3. Methode selon la revendication 1, dans iaqueiie &i'6tape(b)(ii): 

(1 ) seuiement i'un desdits axes de neurone temporaire est s6lectionn6; et 

(2) une distance en accord avec iedit axe de neurone temporaire s6lectionn6 et iedit vecteur a fonctions exis- 
tant est calculee; et 

(3) ledit axe de neu rone temporaire selectionne est reduit par Iadite distance pour creer ledit nouveau neurone. 

4. Methode selon I'une des revendications 1 ou 3, dans Iaqueiie Iadite entree d'apprentissage est representative d'un 
caractere. 

5. Methode selon I'une des revendications 1 ou 3, dans iaqueiie iadite information representative de iadtte entree 
comporte un vecteur a fonctions. 

6. M6thode seion ia revendication 5, dans iaqueiie iedit vecteur k fonctions comport6 un 6i6ment k fonctions, iedit 
element a fonctions etant un element a fonctions de Grid ou un element a fonctions de Hadamard. 

7. Methode selon I'une des revendications 1 ou 3, dans Iaqueiie ledit neurone est une hyper-ellipse ou un hyper- 
rectangle dans un espace des fonctions k-dimensionnel, ou l< est superieur ou egal a deux. 
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8. Appareil d'apprentissage pour cr§er un nouveau neurone dans un espace des fonctions comportant au moins un 
neurone existant, comprenant: 

des moyens de g6n§ration (504) pour g6n6rer un vecteur k fonctions repr6sentatif d'une entree d'apprentis- 
s sage, ladite entree d'apprentissage correspondant a I'une d'une plurality de sorties possibles; et 

des moyens de creation (514) pourcreer ledit nouveau neurone, si aucun neurone existant (506) correspon- 
dant k ladite entr6e d'apprentissage inclut ledit vecteur k fonctions; 

caracterise en ce que 

10 ledit nouveau neurone comporte une llmlte definie par deux axes de neurone ou plus de longueurs differentes; 

ledit espace des fonctions comporte un vecteur k fonctions existant; lesdits moyens de creation creent un neurone 
temporaire comportant une llmlte d6flnie par deux axes de neurone temporaire ou plus; et 
si ledit neurone temporaire inclut ledit vecteur k fonctions existant (508) et si ledit vecteur k fonctions existant ne 
con-espond pas k ladite entr6e d'apprentissage, alors lesdits moyens de creation ajustent spatiatement ledit neu- 

is rone temporaire pour cr6er ledit nouveau neurone; 

et en ce que lesdits moyens de creation 

(1) selectionnent au moins I'un desdits axes de neurone temporaire; 

(2) calculent les distances le long de chacun desdits axes de neurone temporaire s§lectionnes entre le centre 
20 dudit neurone temporaire et ledit vecteur & fonctions existant; et 

(3) r§dulsent lesdits axes de neurone temporaire sSlectipnnds par des quantitSs proportlonnelles aux dites 
distances et aux longueurs desdits axes de neurone temporaire s^lectiohnSs pour crSer ledit nouveau neu- 
rone. 

25 9. Appareil selon la revendication 8, dans lequel lesdits moyens de creation selectionnent tous les axes de neurone 
temporaire pour ladite reduction. 

10. Appareil selon la revendication 8, dans lequel lesdits moyens de creation selectionnent seulement I'uri des axes 
de neurone temporaire, lesdits moyens de creation calculent une distance en accord avec ledit axe de neurone 

30 temporaire s6lectionne et ledit vecteur k fonctions existant, et lesdits moyens de creation reduisent ledit axe de 

neurone temporaire selectionn§ par ladite distance pour creer ledit nouveau neurone. 

1 1 . Appareil selon la revendication 8, dans lequel ladite entree d'apprentissage est representative d'un caractfere. 

3s 1 2. Appareil selon la revendication 8, dans lequel ladite Infonnation representative de ladite entree coniporte un vecteur 
a fonctions. 

13. Appareir selon la revendication 12, dans lequel ledit vecteur & fonctions comporte un element & fonctions, ledit 
element k fonctions 6tant un 6l§ment a fonctions de Grid ou un 6l6ment k fonctions de Hadamard. 

40 

1 4. Appareil selon la revendication 1 2, dans lequel lesdites moyens de comparaison dSterminent si ledit neurone Inclut 
ledit vecteur a fonctions. 

15. Appareil selon la revendication 12, dans lequel lesdits moyens de comparaison ditenninent une mesure de dis- 
45 tance entre ledit vecteur a fonctions et ledit neurone. 



16. Appareil selon la revendication 8, dans lequel ledit neurone est une hyper-ellipse ou un hyper-rectangle dans un 
espace des fonctions k-dimensionnel, oil k est sup6rieur ou egal k deux. 
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